Skip to content

Conversation

davepacheco
Copy link
Collaborator

@davepacheco davepacheco commented Oct 10, 2025

This is a first step towards including Reconfigurator state in support bundles. I wanted callers to be able to cap how big these files could be so that the process doesn't take an unbounded amount of time or space. Then I realized that this also allows folks to use omdb reconfigurator archive to clean up enormous numbers of blueprints in batches by just re-running the tool with whatever limit they want.

There are two things less than ideal about this:

  • This always saves the latest N blueprints. It doesn't pay attention to whether any are the target or are in the target history. It might not even save the target (though this would be very unlikely).
  • As a result, when archiving and hitting the limit, it deletes the N most recent blueprints that aren't the target. It might be nicer if it saved/deleted the N oldest that aren't the target. (That said, then you wouldn't have the information you most likely care about, which is the more recent stuff.)

I think this is all okay because:

  • In practice, because of our use of omdb reconfigurator archive regularly with each update, we should never have that many blueprints in a system that we care about. So we'll generally still be grabbing everything.
  • If we do wind up missing stuff, it'll be the oldest blueprints that are missed. For this to not include the current target, something would have to have generated N newer blueprints, never made any of them the target, and not deleted them. No automation does this today and there's no reason to do it.
  • Even if we do: the worst case here is that the file doesn't have all the data you think it does. The tool warns when this might have happened. The data is not lost because the tool will not have deleted anything that it didn't save.

I consider all of this sort of a stopgap. We really need to do #7278 so that support doesn't have to run tools to archive/prune old blueprints.

@davepacheco
Copy link
Collaborator Author

I tested this manually against cargo xtask omicron-dev run-all:

$ cargo xtask omicron-dev run-all
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 0.68s
     Running `target/debug/xtask omicron-dev run-all`
    Finished `dev` profile [unoptimized + debuginfo] target(s) in 1.17s
     Running `target/debug/omicron-dev run-all`
omicron-dev: setting up all services ... 
log file: /dangerzone/omicron_tmp/omicron-dev-omicron-dev.3431.0.log
note: configured to log to "/dangerzone/omicron_tmp/omicron-dev-omicron-dev.3431.0.log"
DB URL: postgresql://root@[::1]:58009/omicron?sslmode=disable
DB address: [::1]:58009
log file: /dangerzone/omicron_tmp/omicron-dev-omicron-dev.3431.2.log
note: configured to log to "/dangerzone/omicron_tmp/omicron-dev-omicron-dev.3431.2.log"
omicron-dev: Adding disks to first sled agent
omicron-dev: services are running.
omicron-dev: nexus external API:     127.0.0.1:12220
omicron-dev: nexus internal API:     [::1]:12221
omicron-dev: nexus lockstep API:     [::1]:12232
omicron-dev: cockroachdb pid:        3435
omicron-dev: cockroachdb URL:        postgresql://root@[::1]:58009/omicron?sslmode=disable
omicron-dev: cockroachdb directory:  /dangerzone/omicron_tmp/.tmpQH7Ot1
omicron-dev: clickhouse native addr: [::1]:41270
omicron-dev: clickhouse http addr:   [::1]:61607
omicron-dev: internal DNS HTTP:      http://[::1]:56295
omicron-dev: internal DNS:           [::1]:53113
omicron-dev: external DNS name:      oxide-dev.test
omicron-dev: external DNS HTTP:      http://[::1]:33141
omicron-dev: external DNS:           [::1]:53802
omicron-dev:   e.g. `dig @::1 -p 53802 test-suite-silo.sys.oxide-dev.test`
omicron-dev: management gateway:     http://[::1]:59220 (switch0)
omicron-dev: silo name:              test-suite-silo
omicron-dev: privileged user name:   test-privileged
omicron-dev: privileged password:    oxide
...

Set the env variable based on the output above so I don't have to keep specifying it:

export OMDB_DNS_SERVER=[::1]:53113

Then I created a sequence of blueprints and made each successive one the target, like this:

dap@ivanova omicron-fix $ ./target/debug/omdb nexus blueprints regenerate -w
note: Nexus URL not specified.  Will pick one from DNS.
note: using Nexus URL http://[::1]:12232
generated new blueprint fd72a673-1385-4a51-97ba-82eb8a5f5477

dap@ivanova omicron-fix $ ./target/debug/omdb nexus blueprints list
note: Nexus URL not specified.  Will pick one from DNS.
note: using Nexus URL http://[::1]:12232
T ENA ID                                   PARENT                               TIME_CREATED             
* no  9ec0a420-fa49-4a10-9060-3185ba5e266b <none>                               2025-10-10T23:36:27.385Z 
      fd72a673-1385-4a51-97ba-82eb8a5f5477 9ec0a420-fa49-4a10-9060-3185ba5e266b 2025-10-10T23:41:02.360Z 

dap@ivanova omicron-fix $ ./target/debug/omdb nexus blueprints target set fd72a673-1385-4a51-97ba-82eb8a5f5477 disabled -w
note: Nexus URL not specified.  Will pick one from DNS.
note: using Nexus URL http://[::1]:12232
set target blueprint to fd72a673-1385-4a51-97ba-82eb8a5f5477

dap@ivanova omicron-fix $ ./target/debug/omdb nexus blueprints regenerate -w
note: Nexus URL not specified.  Will pick one from DNS.
note: using Nexus URL http://[::1]:12232
generated new blueprint 87722569-b282-4ef3-92d7-4563d5b6f7c8

dap@ivanova omicron-fix $ ./target/debug/omdb nexus blueprints target set 87722569-b282-4ef3-92d7-4563d5b6f7c8 disabled -w
note: Nexus URL not specified.  Will pick one from DNS.
note: using Nexus URL http://[::1]:12232
set target blueprint to 87722569-b282-4ef3-92d7-4563d5b6f7c8

dap@ivanova omicron-fix $ ./target/debug/omdb nexus blueprints regenerate -w
note: Nexus URL not specified.  Will pick one from DNS.
note: using Nexus URL http://[::1]:12232
generated new blueprint fc3fa6c9-5be3-445d-acb5-4ad2dc79e05a

dap@ivanova omicron-fix $ ./target/debug/omdb nexus blueprints target set fc3fa6c9-5be3-445d-acb5-4ad2dc79e05a disabled -w
note: Nexus URL not specified.  Will pick one from DNS.
note: using Nexus URL http://[::1]:12232
set target blueprint to fc3fa6c9-5be3-445d-acb5-4ad2dc79e05a

In the end I had this list:

$ ./target/debug/omdb nexus blueprints list
note: Nexus URL not specified.  Will pick one from DNS.
note: using Nexus URL http://[::1]:12232
T ENA ID                                   PARENT                               TIME_CREATED             
      9ec0a420-fa49-4a10-9060-3185ba5e266b <none>                               2025-10-10T23:36:27.385Z 
      fd72a673-1385-4a51-97ba-82eb8a5f5477 9ec0a420-fa49-4a10-9060-3185ba5e266b 2025-10-10T23:41:02.360Z 
      87722569-b282-4ef3-92d7-4563d5b6f7c8 fd72a673-1385-4a51-97ba-82eb8a5f5477 2025-10-10T23:42:11.016Z 
      fc3fa6c9-5be3-445d-acb5-4ad2dc79e05a 87722569-b282-4ef3-92d7-4563d5b6f7c8 2025-10-10T23:42:18.813Z 
      9c8710a3-d2dd-4013-8825-b57d1a60c1ae fc3fa6c9-5be3-445d-acb5-4ad2dc79e05a 2025-10-10T23:42:24.810Z 
      3debad0c-9caf-430c-aa86-554e189d23f1 9c8710a3-d2dd-4013-8825-b57d1a60c1ae 2025-10-10T23:42:31.683Z 
      bb29f7c2-783a-49d0-8f14-919a98757014 3debad0c-9caf-430c-aa86-554e189d23f1 2025-10-10T23:42:37.167Z 
      8a9ab5bb-3072-492b-9cb6-d221ef77b4f6 bb29f7c2-783a-49d0-8f14-919a98757014 2025-10-10T23:42:51.571Z 
      a974188b-b86f-456b-80c4-7c9f0a25d9c4 8a9ab5bb-3072-492b-9cb6-d221ef77b4f6 2025-10-10T23:42:58.137Z 
      f7e79617-41df-49da-8efe-9bd3e3a4a119 a974188b-b86f-456b-80c4-7c9f0a25d9c4 2025-10-10T23:43:03.757Z 
* no  68fd25eb-0c10-4789-968c-ff776a7d718b f7e79617-41df-49da-8efe-9bd3e3a4a119 2025-10-10T23:43:12.113Z 

First, I'll export with the default pretty large limit:

dap@ivanova omicron-fix $ ./target/debug/omdb reconfigurator export export-default.json
note: database URL not specified.  Will search DNS.
note: (override with --db-url or OMDB_DB_URL)
note: using database URL postgresql://root@[::1]:58009/omicron?sslmode=disable
note: database schema version matches expected (197.0.0)
assembling reconfigurator state ... done
saving to export-default.json ... done

dap@ivanova omicron-fix $ jq '.blueprints[] | .id' < export-default.json
"68fd25eb-0c10-4789-968c-ff776a7d718b"
"f7e79617-41df-49da-8efe-9bd3e3a4a119"
"a974188b-b86f-456b-80c4-7c9f0a25d9c4"
"8a9ab5bb-3072-492b-9cb6-d221ef77b4f6"
"bb29f7c2-783a-49d0-8f14-919a98757014"
"3debad0c-9caf-430c-aa86-554e189d23f1"
"9c8710a3-d2dd-4013-8825-b57d1a60c1ae"
"fc3fa6c9-5be3-445d-acb5-4ad2dc79e05a"
"87722569-b282-4ef3-92d7-4563d5b6f7c8"
"fd72a673-1385-4a51-97ba-82eb8a5f5477"
"9ec0a420-fa49-4a10-9060-3185ba5e266b"

That got all of them, as expected.

Then I'll export with a lower limit:

dap@ivanova omicron-fix $ ./target/debug/omdb reconfigurator export --nmax-blueprints 5 export-limit-5.json
note: database URL not specified.  Will search DNS.
note: (override with --db-url or OMDB_DB_URL)
note: using database URL postgresql://root@[::1]:58009/omicron?sslmode=disable
note: database schema version matches expected (197.0.0)
assembling reconfigurator state ... done
warning: reached limit of 5 while fetching blueprints
warning: saving only the most recent 5
saving to export-limit-5.json ... done

dap@ivanova omicron-fix $ jq '.blueprints[] | .id' < export-limit-5.json 
"68fd25eb-0c10-4789-968c-ff776a7d718b"
"f7e79617-41df-49da-8efe-9bd3e3a4a119"
"a974188b-b86f-456b-80c4-7c9f0a25d9c4"
"8a9ab5bb-3072-492b-9cb6-d221ef77b4f6"
"bb29f7c2-783a-49d0-8f14-919a98757014"

That saved only the expected number, and only the most recent ones.

I wanted to try the workflow of archiving in batches. So I started by archiving with limit 8:

dap@ivanova omicron-fix $ ./target/debug/omdb reconfigurator archive --nmax-blueprints 8 archive-limit-8.json -w
note: database URL not specified.  Will search DNS.
note: (override with --db-url or OMDB_DB_URL)
note: using database URL postgresql://root@[::1]:58009/omicron?sslmode=disable
note: database schema version matches expected (197.0.0)
assembling reconfigurator state ... done
warning: reached limit of 8 while fetching blueprints
warning: saving only the most recent 8
saving to archive-limit-8.json ... done
removing saved, non-target blueprints ...
  successfully deleted blueprint f7e79617-41df-49da-8efe-9bd3e3a4a119
  successfully deleted blueprint a974188b-b86f-456b-80c4-7c9f0a25d9c4
  successfully deleted blueprint 8a9ab5bb-3072-492b-9cb6-d221ef77b4f6
  successfully deleted blueprint bb29f7c2-783a-49d0-8f14-919a98757014
  successfully deleted blueprint 3debad0c-9caf-430c-aa86-554e189d23f1
  successfully deleted blueprint 9c8710a3-d2dd-4013-8825-b57d1a60c1ae
  successfully deleted blueprint fc3fa6c9-5be3-445d-acb5-4ad2dc79e05a
done (7 blueprints deleted)
warning: Only tried deleting the most recent 8 blueprints
warning: because that's all that was fetched and saved.
warning: You may want to run this tool again to archive more.

That file has the expected 8, including the current target, meaning it deleted 7:

dap@ivanova omicron-fix $ jq '.blueprints[] | .id' < archive-limit-8.json 
"68fd25eb-0c10-4789-968c-ff776a7d718b"
"f7e79617-41df-49da-8efe-9bd3e3a4a119"
"a974188b-b86f-456b-80c4-7c9f0a25d9c4"
"8a9ab5bb-3072-492b-9cb6-d221ef77b4f6"
"bb29f7c2-783a-49d0-8f14-919a98757014"
"3debad0c-9caf-430c-aa86-554e189d23f1"
"9c8710a3-d2dd-4013-8825-b57d1a60c1ae"
"fc3fa6c9-5be3-445d-acb5-4ad2dc79e05a"

And they're gone:

dap@ivanova omicron-fix $ ./target/debug/omdb nexus blueprints list
note: Nexus URL not specified.  Will pick one from DNS.
note: using Nexus URL http://[::1]:12232
T ENA ID                                   PARENT                               TIME_CREATED             
      9ec0a420-fa49-4a10-9060-3185ba5e266b <none>                               2025-10-10T23:36:27.385Z 
      fd72a673-1385-4a51-97ba-82eb8a5f5477 9ec0a420-fa49-4a10-9060-3185ba5e266b 2025-10-10T23:41:02.360Z 
      87722569-b282-4ef3-92d7-4563d5b6f7c8 fd72a673-1385-4a51-97ba-82eb8a5f5477 2025-10-10T23:42:11.016Z 
* no  68fd25eb-0c10-4789-968c-ff776a7d718b f7e79617-41df-49da-8efe-9bd3e3a4a119 2025-10-10T23:43:12.113Z 

If we archive again, whatever the limit, we'll fetch them all and delete all but the current target:

dap@ivanova omicron-fix $ ./target/debug/omdb reconfigurator archive archive-default.json -w
note: database URL not specified.  Will search DNS.
note: (override with --db-url or OMDB_DB_URL)
note: using database URL postgresql://root@[::1]:58009/omicron?sslmode=disable
note: database schema version matches expected (197.0.0)
assembling reconfigurator state ... done
saving to archive-default.json ... ^[[Adone
removing saved, non-target blueprints ...
  successfully deleted blueprint 87722569-b282-4ef3-92d7-4563d5b6f7c8
  successfully deleted blueprint fd72a673-1385-4a51-97ba-82eb8a5f5477
  successfully deleted blueprint 9ec0a420-fa49-4a10-9060-3185ba5e266b
done (3 blueprints deleted)

Again, we've got the latest 4, including the target and three that we deleted:

dap@ivanova omicron-fix $ jq '.blueprints[] | .id' < archive-default.json 
"68fd25eb-0c10-4789-968c-ff776a7d718b"
"87722569-b282-4ef3-92d7-4563d5b6f7c8"
"fd72a673-1385-4a51-97ba-82eb8a5f5477"
"9ec0a420-fa49-4a10-9060-3185ba5e266b"

and all but the target are gone now:

dap@ivanova omicron-fix $ ./target/debug/omdb nexus blueprints list
note: Nexus URL not specified.  Will pick one from DNS.
note: using Nexus URL http://[::1]:12232
T ENA ID                                   PARENT                               TIME_CREATED             
* no  68fd25eb-0c10-4789-968c-ff776a7d718b f7e79617-41df-49da-8efe-9bd3e3a4a119 2025-10-10T23:43:12.113Z 

Here's the history now:

dap@ivanova omicron-fix $ ./target/debug/omdb reconfigurator history
note: database URL not specified.  Will search DNS.
note: (override with --db-url or OMDB_DB_URL)
note: using database URL postgresql://root@[::1]:58009/omicron?sslmode=disable
note: database schema version matches expected (197.0.0)
VERSN TIME                     BLUEPRINT                           
    1 2025-10-10T23:36:27.692Z 9ec0a420-fa49-4a10-9060-3185ba5e266b disabled: blueprint details no longer available
    2 2025-10-10T23:42:03.698Z fd72a673-1385-4a51-97ba-82eb8a5f5477 disabled: blueprint details no longer available
    3 2025-10-10T23:42:16.467Z 87722569-b282-4ef3-92d7-4563d5b6f7c8 disabled: blueprint details no longer available
    4 2025-10-10T23:42:23.595Z fc3fa6c9-5be3-445d-acb5-4ad2dc79e05a disabled: blueprint details no longer available
    5 2025-10-10T23:42:29.405Z 9c8710a3-d2dd-4013-8825-b57d1a60c1ae disabled: blueprint details no longer available
    6 2025-10-10T23:42:35.700Z 3debad0c-9caf-430c-aa86-554e189d23f1 disabled: blueprint details no longer available
    7 2025-10-10T23:42:42.136Z bb29f7c2-783a-49d0-8f14-919a98757014 disabled: blueprint details no longer available
    8 2025-10-10T23:42:56.661Z 8a9ab5bb-3072-492b-9cb6-d221ef77b4f6 disabled: blueprint details no longer available
    9 2025-10-10T23:43:02.527Z a974188b-b86f-456b-80c4-7c9f0a25d9c4 disabled: blueprint details no longer available
   10 2025-10-10T23:43:08.388Z f7e79617-41df-49da-8efe-9bd3e3a4a119 disabled: blueprint details no longer available
   11 2025-10-10T23:43:16.566Z 68fd25eb-0c10-4789-968c-ff776a7d718b disabled: 

I think this is all working as expected.

@davepacheco davepacheco marked this pull request as ready for review October 11, 2025 00:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant